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Introduction 


Over the years, statistical significance has been the cornerstone of 
inferential statistics. In testing a treatment effect, the null hypothesis 
is often written as ‘no effect’ and the alternative is written as ‘there 
is an effect’. The significance test yields a p-value that is usually 
compared with the conventional value of 0.05. If the p-value is less 
than 0.05, the null hypothesis is rejected, and a statistical significance 
is said to be established. Obtaining significant results is a tremendous 
accomplishment but it does not tell the entire story behind the results. 


For example, if we want to test if a new drug is effective in the 
management of hypertension. We would state the null hypothesis as 
‘there is no change’ (mean difference = 0) and the alternative 
hypothesis as ‘there is a change’ (mean difference # 0). In a single- 
armed study, we would select a few patients (n) with hypertension, 
record their baseline blood pressure (BP) values, administer the drug, 
for about two weeks, and measure their BP again after two weeks. 
To assess the efficacy of the drug, usually the paired samples t-test 
would be used. [1] In this test, the pairwise differences in the 
individual patient’s BP are computed and the mean difference is 
compared against the value of 0. In this test, based on the absolute 
mean difference(M), standard deviation(S) and sample size(n), a t- 
statistics is computed as: 


t= (=)xvn or (1) 


In (1), t is a continuous value that stretches from negative to positive 
and is said to follow a f¢-distribution. The curve is symmetrical and 
‘bell-shaped’, peaking at 0 and tapering off to the right and left of 0. 
When there is no change at all, M will be 0 and f¢ will be 0 too. The 
area under the curve, that is the p-value, to the right of t=0 will be 0.5 
(half the area under the curve). Conventionally, when the p-value is 
< 0.05, the following is inferred: tis large, M is far from 0, and hence 
there a significant change in BP. If the change is in the desirable 
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direction, the drug is said to be effective. For practical purposes, 
when f is >2, the p-value is taken as less than 0.05. 


Example: 1 


M=5.0, S=10.0, n=25; t = (=*) x V¥25 =2.5 , the p-value will be < 


10. 
0.05. 


In example (1), note that even when M is small in absolute value, for 
a large n, ¢ will be far from 0 and the p-value will be less 0.05, making 
us make the same conclusion; there is a significant change in mean 
BP. 


Example: 2 


M=1.1,S=10.0,n=400; t = (+) x V400 = 2.1, the p-value will be 


10.0 
< 0.05. 


But the question is, is a mean reduction of 1.1 mmHg of any clinical 
importance? This makes us ponder on the usage of the evidence 
presented in the form of p-values alone. It appears we can prove 
anything is ‘significant’ as long as we have an adequate sample size. 


In equation (1), the term (=) is called the effect size (ES). Note that 


it is not amplified by the sample size. That is, ES is the real 
difference, irrespective of sample size. Effect size is typically 
expressed as Cohen’s d. Cohen described a small effect = 0.2, 
medium effect size = 0.5 and large effect size = 0.8 [2]. In example 
(1), ES = 5/10 = 0.5, which is moderate and in example (2), ES = 
1.1/10 = 0.11, which is very small. But based on p-value alone, both 
are statistically significant results. This highlights an important point: 
do not assess evidence from the statistical point alone, look at 
practical evidence as well. In 2014, The Basic and Applied Social 
Psychology editorial emphasized that the null significance testing 
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procedure should be discouraged. [3] 


Other researchers recommend reporting confidence intervals (CI), 


incorporating accuracy (or margin of error, ME) as M+ME [4]. For 
practical purposes, ME for testing means is 2 (=), where 2 is the 
approximate constant value for a 95% CI. For example, a 95% CI for 
mean difference of [5, 8], would be interpreted as ‘we are 95% confident 


the mean difference will be between 5 and 8’. 


In example (1), ME = 2 (=) “3 (=)-2. Hence, the 95% is 5.0+2 = 
[3.0, 7.0] 


In example (2), ME = 2 (a3) = 1. Hence, the 95% is 1.1+1 =[0.1, 2.0] 
In example (2), ME is half of that in example (1), and hence the interval 
is much narrower (more precise), which is good. However, looking at 
the evidence, in example (2), the lower limit of 0.1, hardly misses the 
value of 0, indicating the possibility of there being no change. 


Results for example (1) would be reported as ‘the mean difference is 5.0 
(95% CI = 3.0, 7.0, p<0.05). Stating the CI as well gives a better 
indication of the drug effect. In this case, the drug is effective. 


Results for example (2) would be reported as ‘the mean difference is 
1.1(95% CI = 0.1, 2.0, p<0.05). It is clear the p-value is inflated by the 
huge sample size (statistical evidence), but the effect is low (clinical 
evidence). 


According to Cumming, researchers should always report confidence 
intervals, as it conveys what a p-value does not: the magnitude and 
relative importance of an effect. [4] 
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